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The Wright-Fisher Fokker-Planck equation describes the stochastic dynamics of self-reproducing, 
competing variants at fixed population size. We use Fisher’s angular transformation, which defines 
a natural length for this stochastic process, to remove the co-ordinate dependence of it’s diffusive 
dynamics, resulting in simple Brownian motion in an unstable potential, driving variants to extinc¬ 
tion or fixation. This insight allows calculation of very accurate asymptotic formula for the Green’s 
function under neutrality and selection, using a novel heuristic Gaussian approximation. 


Understanding the interplay between stochastic and 
deterministic forces in systems with different reproduc¬ 
ing variants is a theme that arises, and has importance, 
in many different scientific fields ;I] including language 
evolution [2], protein evolution 0 ®, the evolution of bio¬ 
diversity EHZ3 and population genetics 00- This article 
is concerned with a fundamental question in population 
genetics, given the possibility of only two reproducing 
variants, how does the probability distribution of gene 
frequency x(t ) change over time, given it is known at 
a prior time point Xq = x(0), subject to small number 
fluctuations (genetic drift) and selection (competition), 
though the analogous question may be posed in any of 
these fields. We address this question in the context of 
the Wright-Fisher model, which is the canonical model of 
stochastic dynamics incorporating both these features. 

The diffusion approximation of the Wright-Fisher 
model describes the stochastic dynamics of gene fre¬ 
quency x (= n/N, where n is the number of copies of 
the mutant allele and N the total population): 
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where G(x,xo',t) is the transition probability density, or 
Green’s function, of gene frequency given an initial con¬ 
dition G(x,x o;0) = S(x — xq), and s is the selection co¬ 
efficient. This equation is derived in the large N limit 
from a Master equation of discrete populations of each 
variant at a fixed N M- 

This Fokker-Planck equation has been studied exten¬ 
sively. In particular, Kirnura calculated a series solution 
for the neutral equation in terms of Gegenbauer poly¬ 
nomials mm, which was later extended to the multi¬ 
allele case by Baxter, et al m- For the case of selec¬ 
tion Kirnura also calculate a series solution, however, the 
eigenvalues could not be represented in closed-form in 
terms of the population size N and selection coefficient 
s llOj . More recently, a number of methods have been 
developed to calculate the Green’s function under selec¬ 


tion, including a numerical matrix approach m and per¬ 
turbation theory based on a path-integral formulation of 
the Wright-Fisher process [14] , However, for many prac¬ 
tical applications, such as virus evolution, where popu¬ 
lation sizes are large and generation times short, these 
approaches are not very practical as a large number of 
terms is required for convergence at short times. The so¬ 
lution of Voronka and Keller |15| , which uses an asymp¬ 
totic ray approximation of the Green’s function, is valid 
at short times and for models of selection, neutrality and 
mutation, but their approach is not very intuitive and 
unwieldy requiring switching between different solutions 
in a time-dependent manner. We present a simple short- 
time asymptotic calculation of the Green’s function in 
closed form for both neutrality and selection, which has 
intuitive appeal as it exploits Fisher’s angular transfor¬ 
mation, which as we show is the natural co-ordinate for 
Wright-Fisher stochastic dynamics [16i . 



FIG. 1. Effective deterministic force in angular domain for 
Wright-Fisher process, where 2NF(ff) = — cot 6 + Ns sin 6. 


Fokker-Planck equations with co-ordinate dependent 
diffusion constants such as EqnJTjhave the property that 
space is explored at different rates dependent on the po¬ 
sition in the domain; using this intuition Antonelli et 
al pTO], suggested the natural definition of length for a 
stochastic process be related to the differential dd 2 = 
Hijdij dadd.T- 7 , where gij is a metric tensor and taken 
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to be the inverse of the covariance matrix g lJ . In one- 
dimension, this is simply d0 2 = d x 2 /D(x), which repre¬ 
sents the (differential) mean square distance traversed in 
equal times. For the diffusion constant of random drift, 
D(x) = ir(l — a;), the stochastic distance is simply 


r x dr' 

6= / = co S - 1 (l — 2 a"). ( 2 ) 

J yx'{± - r) 

This is Fisher’s angular transformation D21 UBI- Ap¬ 
plying this transformation to the neutral Fokker-Planck 
equation (s = 0 ), we get 
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or the equivalent stochastic differential equation mm 

ft (4) 

where ( rj(t )) = 0 and (r](t)r](t')) = S(t—t')/N. We see the 
result of this transformation is a co-ordinate independent 
diffusion constant in 0— space, but now with an effective 
deterministic force f{6) = cot(0)/2N. This arises due to 
the co-ordinate dependent diffusion constant x(l — x)/N 
in the a:—domain and drives the system towards regions 
of decreasing diffusion constant. It is also exactly the 
spurious drift term that arises in transforming between 
Ito and Stratonovich descriptions of stochastic dynamics 
[191 20). Note that in the Langevin representation, Eqnjl] 
would have a multiplicative noise term, which is trans¬ 
formed to additive noise in 8 —space in Eqnj4j Examining 
the force in Fig [I] we see that it is unstable, on average 
driving a mutant alelle to extinction if 0(0) < n/2 and fix¬ 
ation if 0(0) > 7 t/ 2, with a fixed point at 0 = tt/2. To cal¬ 
culate an approximate solution of the Green’s function, 
we make a Taylor expansion of the force about the fixed 
point 0 = 7r/2 to linear order to give a linear stochastic 
differential equation, 0 = ^ (0 — f) + v(t). As we will 
see this approximation works well even for initial frequen¬ 
cies of order 10% (x(0) = 0.1), due to the non-linearity 


of the angular transformation, which has the property of 
compressing the central range in x —space about x = 1/2 
to a smaller central region in 0 — space about 0 = n/2 
(for example, x = 0.1 —► 0 = 0.64 and x = 0.9 —> 
0 = 2.5). Equivalently, this is an harmonic approxima¬ 
tion of the effective potential function in 0 —space, where 
0 = dgU(0) + rj(t) and U(0) = In sin 0. As the re¬ 
sulting SDE is linear the solution is straightforwardly 
computed as 0 (t) = § + ( 0 o — §) e*/ 2Ar + f* dt'r](t')e i ^~, 
where 0o = 0(0). As the solution is a sum of Gaussian 
random variables 77 the Green’s function for 0 will be 
Gaussian with mean, (0(f)) = f + (0o — f) e‘/ 2Ar , since 
( 77 (f)) = 0 and variance, (( 0 2 (t))) = e t / N — 1 , where the 
van Kampen notation has been used, ((0 2 )) = (0 2 ) —(0) 2 . 
Note that the variance diverges for t N and the mean 
divergences for t 2 N, to —00 when 0 O < n/2 and to 
+00 for 0o > 7r/2 and is fixed for all time at (0) = 7 r/ 2 , if 
0q = tt/2 the fixed point of the deterministic dynamics. 
The Green’s function in 0—space is then 
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Note the similarity of form to the Green’s function of an 
overdamped harmonic oscillator, but with the difference 
that, as discussed, here the mean and variance diverge 
[211 . This solution does not obey the boundary condi¬ 
tions at 0 = 0 and 0 = tt, which are required to be 
absorbing and specifically to go linearly to zero at these 
points; this is in order for the solution in cc—space to be 
finite at the boundaries, as required due to the singular¬ 
ity of the diffusion constant at x = 0 and x = 1 12 ] . 
The method of images cannot be used in this case as the 
required image has it’s forces reversed and so does not 
obey the original Fokker-Planck equation. However, as 
we argue in the discussion, for many applications, includ¬ 
ing virus evolution, the short time behaviour (t <C N) is 
most relevant. Transforming back to space, we have, 
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where the Jacobian is | = 2/sin0 = \/yJx(l — x). 

The results are plotted in Fig]2j at various times and 


initial conditions, as solid lines and compared against 
numerical integration of the exact neutral Wright-Fisher 
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FIG. 2. Comparison of approximate calculation of neutral 
Greens function (solid lines - Eqnjfj]) and numerical inte¬ 
gration of stochastic differential equation that arises from 
diffusion approximation (solid circles), a) initial frequency 
*o = 0.01, b) xo = 0.1, c) xo = 0.5. 


stochastic differential equation (Eqn|4]|. We see that, 
in general, the approximation works very well for short 
times t <C N and when the initial frequency xq is 
not too small. More precisely we would expect the 
approximation to be good for sufficiently short times 
compared to the expected time to fixation, which is 
(t*) = —2N(x 0 ln(x 0 ) + (1 - Xq) ln(l - x 0 )); for x 0 = 
{0.01, 0.1, 0.5}, (t*) « {0.17V, 0.71V, 1.41V}, which is con¬ 
sistent with the results in Fig[2j The solution is more sim¬ 
ple and intuitive than that of Voronka et al, m , where 
it is clear the behaviour of gene frequencies is essentially 
that of Brownian motion in an unstable harmonic po¬ 
tential; the non-linearity in x—space arises purely from 
working in the more natural co-ordinates of the angular 
transformation, where in particular the argument of the 
exponential in the Gaussian solution is just the square 
of the stochastic distance between xq and x. For ex¬ 
ample, it is instructive that, as a consequence, even for 
a neutral process the mean of allele frequencies moves 
towards the extinction or fixation boundary for any ini¬ 


tial frequency Xq ^ 0.5, as is seen clearly from the plots 
of the Green’s functions in Fig[2] and from the solution 
of the mean (0(f)); this is not obvious from the neutral 
Wright-Fisher diffusion equation in x—space (Eqnfljwith 
s = 0). 

For the case of selection, from Eqn[l]and using Fisher’s 
angular transformation, the stochastic differential equa¬ 
tion for 0 is 

^ (cot(6») - lVssin(6»))+ r j(t), (7) 

where rj(t) has the same moments as before. Note that 
the contribution of selection to the effective force tends to 
zero as 6 —> {0,7r}, which agrees with the intuition that 
when an allele is rare, the change in allele frequency is 
dominated by drift; in particular, for 0 1, and Ns 

1, 2Nf(0) ~ — 1/6 + Nsd and the forces of drift and 
selection are roughly in balance when Ns ~ I/O 2 = l/4x, 
where Fisher’s angular transformation is 0 ss \f\x for 
small x - in other words when the allele frequency x <C 
(41Vs) _1 drift dominates. 

When selection is weak (Ns -C 1), the effective force 
in the angular domain is only a weak perturbation on 
the neutral force (Fi 80 and the Green’s functions dif¬ 
fer little from neutrality, particularly at short times (not 
shown). A similar linear expansion of the force can be 
carried out to calculate an asymptotic expression for the 
Green’s function under weak selection, as shown in the 
Supplementary Online Material; the resulting expression 
has similar accuracy compared to numerical simulations 
as in the neutral case. 

In the regime of strong selection Ns > 1, the above 
approach gives a poor approximation, due to the non¬ 
linearity of the effective force in the angular domain 
(Figjl]). Here we present a heuristic approach to solving 
EqjTJ approximately, for any value of Ns. The approach 
is to assume that the Green’s function of the non-linear 
SDE can be approximated by a Gaussian process, where: 
1) the time-varying mean is calculated as a solution to 
the deterministic dynamics of the SDE Eqj7J with initial 
condition 0q , which we show below can be calculated ex¬ 
actly; and 2) the time-varying variance (( 0 2 (t ))) is depen¬ 
dent on the local gradient of the force, which varies as a 
function of the deterministic solution, X = f'((0(t))). In 
general, if an exact solution is not available to the deter¬ 
ministic dynamics, an approximate solution that makes a 
linear approximation of the effective force (Fi 80 about 
the initial condition, also gives accurate results at short 
times (not shown). 

Transforming the deterministic part of Eqn[7] back to 
x—space, we have x = — ^(^(l — 2x) — 2Nsx(l — 
x), the solution to which is of the form x = C + 
Atanh (jt/2 + B). Transforming back to 0— space and 
using the initial condition 0q = (0(0)), the solution to 
the deterministic dynamics of EqnjT] is: 
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(6(t)) = cos 


1 + 27V7tanh (yt /2 — a) 
2iVs 


( 8 ) 


where a = tanh 1 (^ 2Ns 2 n^" +1 ^) anc ^ characteristic 

rate of change of the mean is 7 = V 1 + 4N 2 s 2 /2N. 

The next step is to calculate the variance, which we 
motivate by considering the situation when the slope of 
the effective force is fixed to a constant A, which gives a 
Gaussian solution with variance (( 9 2 (t ))) = 2 ^y(e 2At — 
1). The linearity of the force characterises the Gaussian 
distribution and so if we assume that the effective deter¬ 
ministic force varies slowly over a range of theta repre¬ 
senting the width of the probability density, we can then 
heuristically replace A with the local derivative of the ef¬ 
fective force A ({0(t))) in the variance. This approximates 
the local spreading of the probability density being solely 
due to the local derivative of the force giving a time vary¬ 
ing variance: 


((o 2 m 


1 f 2A mt)))t 

2N\mt))r 
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Note that for strong selection, the derivative of the ef¬ 
fective deterministic force \({8(t))) will be zero at cer¬ 
tain times, as can be seen from the plot of the deter¬ 
ministic force in Fig[l] at these time points it is sim¬ 
ple to see that the variance remains well behaved as 
lim^ol^ 2 ))} t/N, as one would expect if the de¬ 
terministic force tends to a constant. Transforming back 
to x— space, the Green’s function solution is: 


G x (*G x 0 , t) 


(cos 1 (1— 2x)— (f?)) 2 

eX P~ 2 ggg 

v / 2ttx(1 - x)((9 2 )) 
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where (6) and ({0 2 )) are given by Eqnsj8] and [ 9 J respec¬ 
tively, where cos($o) = 1 — 2xo- 
We plot the results for Ns = 10 (xq = {0.1,0.5, 0.9} 
in Fig|3] see Supplementary Online Information for plots 
with Xo = {0.01,0.99}) (Green’s functions for Ns = 1 
are plotted in the Supplementary Online Information). 
We find that for both Ns = 1 and Ns = 10 the heuristic 
approach and the integration of the Wright-Fislier SDE 
(Eqn{7| agree very well at short times compared to the 
average expected time for fixation/extinction of a mu¬ 
tant. This is true even when Xq is very close to 0 or 1 
(Supplementary Online Information) and is reasonably 
accurate to quite long times (t ~ r) for an initial fre¬ 
quency of Xq = 0.1 (Fig|3]A). In addition, we see that 
as well as capturing the broad behaviour of the time- 
varying mean and variance, the insets of the figures show 
the Green’s functions plotted on a log scale, demonstrat¬ 
ing that the approximation is also very accurate in the 
tails of the distribution at short times. Finally, for very 


long times when (8(t)) tends to 0 or 7 r, the variance of 
the heuristic solution Eqn|9] diverges, as A diverges at the 
boundaries, and the approximation fails; this is indicated 
in those cases where there is no heuristic solution plotted 
for a given time in each plot. 




FIG. 3. Comparison of approximate ca lcul ation of Greens 
function for Ns = 10 (solid lines - Eqn |10| ) and numerical 
integration of stochastic differential equation that arises from 
diffusion approximation (solid circles), a) initial frequency 
xo = 0.1, b) xo = 0.5, c) xo = 0.9. Green’s functions are 
plotted at times given by fractions of t = }(1 + In (Ns)), 
which is approximately the expected time to fixation of a 
mutant which survives drift and then is driven to fixation by 
selection [22\ . 


To conclude, we have calculated very accurate approx¬ 
imations of the 2-allele Green’s function (or transition 
probability density) of population genetics for arbitrary 
selection coefficient s and population size N. A key 
advantage and insight of the approach outlined in this 
paper, is that it transforms a non-linear Fokker-Planck 
equation to a simple problem of Brownian motion in an 
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effective potential. Together with the heuristic Gaussian 
approximation, this represents, to the author’s knowl¬ 
edge, a new general approach for asymptotically solving 
Fokker-Planck equation’s with a co-ordinate dependent 
diffusion constant in slowly-varying potentials (or equiva¬ 
lently SDEs with multiplicative noise), where the solution 
to the mean behaviour is known; indeed, in 1-dimension 
a PDE with co-ordinate dependent diffusion can always 
be transformed to one with co-ordinate independent dif¬ 
fusion mug. For more than two variants the methods 
detailed in m, suggests via the metric tensor, a poten¬ 
tial route to finding solutions in higher dimensions. 

These results have potential application to detecting 
selection in time-series data of the composition of vari¬ 
ants, in biological evolution, language evolution and for 
species in ecosystems. In particular, as these results 
have accuracy in the asymptotic short-time limit, they 
will be applicable to studying selection from time-series 
of variants (liaplotypes) in virus evolution, since they 
have large effective population sizes and short genera¬ 
tion times, meaning even sampling virus populations in¬ 
frequently (on the time scale of many months or years) 
would be accurately modelled by the results of this paper. 

I thank Richard A. Goldstein for initially suggesting 
the problem and for useful discussions. I also thank 
Richard Blythe for useful comments on the manuscript. 
This work was supported by The Francis Crick Institute 
which receives its core funding from Cancer Research 
UK, the UK Medical Research Council and the Wellcome 
Trust. 
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Supplemental Materials 

GREEN’S FUNCTION UNDER WEAK SELECTION 

For the case of selection, using Eqn.l in the main text Fisher’s angular transformation 9 = cos _1 (l — 2x), the 
stochastic differential equation for 9 is 


^ =-^( cot W--^ssin(6»))+r?(t), (SI) 

where (ry(t)) = 0 and (ri(t)ri(t')) = 5(t — t')/N. We first try to solve this by expanding the effective deterministic force, 

f (e) = ^(cot(0)~Nssm(6)), (S2) 

about its zero 9* and solve the resulting linear SDE as for the neutral case. This only works well for the case of 
weak selection, as the effective deterministic force becomes increasingly non-linear for all values of 9 when selection 
is strong (Fig.l main text). The solution to f(9) = 0 is 


= cos 


-l 


'-l + \/l + 4 N 2 s 2 
Ws 


which for weak selection, 4ZVs <C 1, is simply, 


9* ~ cos 1 (Ns). 


(S3) 


(S4) 


Intuitively, this makes sense, as for s > 0, this gives 9* < 7r/2, so the dividing point (separatrix) between initial 
conditions that result in deterministic dynamics giving fixation of the mutant allele is shifted to smaller values of 9q 
compared to neutral (9 0 = 7r/2); the converse is true, for s < 0, where 9* > 7r/2. The force expanded to linear order 
is f(9) = X(9 - 9*), where 


A = f(9*) 


±( _ ‘ 

2 N \ 1 — cos 2 (0*) 


Ns cos (9*) 


(S5) 


which again for weak selection is approximately 

a = fin « ^ (i + 2ivV ) ■ ( S6 ) 

The stochastic equation of motion is then 

^ t =X(9-9*)+v(t ) . (S7) 

The analysis then proceeds in the same way as for the neutral case giving the Green’s function in x —space as 


G x (x,x 0 ’,t) = 


NX 


ttx(1 — x)(e 2Xt — 1) 


exp - 


(cos 1 (1 —2a;)—cos 1 (1 — 2xo)e xt — 9* (1 — e xt ))‘ 


(e 2Xt - 1) /NX 


(S8) 


In the regime where Ns 1, stochastic simulations and 
Eqn|S8| agree well at short times with a similar accuracy 
(not shown) as shown in Fig.2 in the main text; the re¬ 
sults show the Green’s function under weak selection are 


only a small perturbation on the neutral Green’s function 
Eqn.6 in main text, and only diverge significantly for long 
times where this approximation, in any case fails. 
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SUPPLEMENTARY FIGURES 
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FIG. SI. Comparison of approximate calculation of Greens 
function for Ns = 1 (solid lines - Eqn.10 in main text) and 
numerical integration of stochastic differential equation that 
arises from diffusion approximation (solid circles), a) initial 
frequency xo = 0.01, b) xo = 0.99. 


We plot the results for Ns = 1 (xq = {0.01, 0.99} in 
Fig|sT| and xq = {0.1, 0.5, 0.9} in FigS2). As discussed 
in the main text, we find that for both Ns = 1 and 
Ns = 10 the heuristic approach and the integration of 
the Wright-Fisher SDE (Eqn.7 in main text) agree very 
well at short times compared to the average expected 
time for fixation/extinction of a mutant. 

For Ns = 1, we expect the mean time to fixa¬ 
tion/extinction to be of order ~ TV = 1/s and so the 
Green’s functions are plotted at different times t which 
are fractions of N. We see that the time range from zero 
that the approximation is accurate decreases as the ini¬ 
tial frequency Xq is nearer to either of the boundaries, 
but for sufficiently short times, even when Xq = 0.01 or 
xq = 0.99 (Fig SI), the heuristic solution is very accu¬ 
rate. The main difference in the Green’s functions for 
TVs = 1 and Ns = 10 are that the distributions are more 


narrow about their peak for Ns = 10, which is as ex¬ 
pected as under stronger selection as the dynamics will 
be more deterministic; we see that the heuristic approx¬ 
imation captures this behaviour very accurately. 

In Fig|S3[ we have plotted the approximate heuristic 
Green’s function for strong selection (Ns = 10), which 



FIG. S2. Comparison of approximate calculation of Greens 
function for Ns = 1 (solid lines - Eqn.10 in main text) and 
numerical integration of stochastic differential equation that 
arises from diffusion approximation (solid circles), a) initial 
frequency Xo = 0.1, b) xo = 0.5, c) Xo = 0.9. 


initial frequencies of Xq = {0.01,0.99}. We see that the 
approximation is again very good for sufficiently short 
times compared to the expect time to fixation or extinc¬ 
tion. 
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FIG. S3. Comparison of approximate calculation of Greens 
function for Ns = 10 (solid lines - Eqn.10 in main text) and 
numerical integration of stochastic differential equation that 
arises from diffusion approximation (solid circles), a) initial 
frequency xo = 0.01, b) xo = 0.99. Green’s functions are 
plotted at times given by fractions of r = |(1 + ln(fVs)), 
which is approximately the expected time to fixation of a 
mutant which survives drift and then is driven to fixation by 
selection. 















